Efficient text fingerprinting via Parikh mapping

نویسندگان

  • Amihood Amir
  • Alberto Apostolico
  • Gad M. Landau
  • Giorgio Satta
چکیده

We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ , and S′ is a substring of S, then the fingerprint of S′ is the subset φ of Σ of precisely the symbols appearing in S′. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n|Σ | logn log |Σ |) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ ⊆ Σ , compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ | logn).  2003 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Matrix q-Analogue of the Parikh Map

We introduce an extension of the Parikh mapping called the Parikh -matrix mapping, which takes its values in matrices with polynomial entries. The morphism constructed represents a word over a -letter alphabet as a -dimensional upper-triangular matrix with entries that are nonnegative integral polynomials in variable . We show that by appropriately embedding the -letter alphabet into the -lette...

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

Quantum fingerprints that keep secrets

We introduce a new type of cryptographic primitive that we call hiding fingerprinting. A (quantum) fingerprinting scheme translates a binary string of length n to d (qu)bits, typically d ≪ n, such that given any string y and a fingerprint of x, one can decide with high accuracy whether x = y. Classical fingerprinting schemes cannot hide information very well: a classical fingerprint of x that g...

متن کامل

A q-Matrix Encoding Extending the Parikh Matrix Mapping

We introduce a generalization of the Parikh mapping called the Parikh q-matrix encoding, which takes its values in matrices with polynomial entries. The encoding represents a word w over a k-letter alphabet as a (k + 1)-dimensional upper-triangular matrix with entries that are nonnegative integral polynomials in variable q. Putting q = 1, we obtain the morphism introduced by Mateescu, Salomaa, ...

متن کامل

Codi able Languages and the Parikh Matrix Mapping

We introduce a couple of families of codi able languages and investigate properties of these families as well as interrelationships between di erent families. We also develop an algorithm based on the Earley algorithm to compute the values of the inverse of the Parikh matrix mapping over a codi able context-free language. Finally, an attributed grammar that computes the values of the Parikh mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Discrete Algorithms

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2003